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Abstract 



H 

^0 ' This paper studies the change point problem for a general parametric, univariate or multivariate 

(-H ^ family of distributions. An information theoretic procedure is developed which is based on general 

divergence measures for testing the hypothesis of the existence of a change. For comparing the 
accuracy of the new test-statistic a simulation study is performed for the special case of a univariate 
discrete model. Finally, the procedure proposed in this paper is illustrated through a classical 
change-point example. 
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1 Introduction 



The change point problem has been considered and studied by several authors the last five decades. 
Change point analysis is a statistical tool for determining whether a change has taken place at a 
point of a sequence of observations, such that the observations are described by one distribution up 
^ ' to that point and by another distribution after that point. Change-point analysis concerns with the 

detection and estimation of the point at which the distribution changes. One change point problem 
or multiple change points problem have been studied in the literature, depending on whether one or 
more change points are observed in a sequence of random variables. Several methods, parametric 
or non-parametric, have been developed to approach the solution of this problem while the range of 
applications of change point analysis is broad. Applications can be encountered in many areas such as 
statistical quality control, public health, medicine, finance, biomedical signal processing, meteorology, 
seismology, etc. The monograph by Chen and Gupta (2000) summarizes recent developments in 
parametric change-point analysis. 

Typical situations encountered in the literature of parametric multiple change points analysis are as 
follows: Let Xi, ^2) •••, ^x be K independent d-variate observations {d G N) and let {X^'^\ fix, Pe)eee 
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the statistical space associated with the random variable (r.v.) Xi, i = 1, ..., K. The probability den- 
sity function with respect to a cr-finite measure /x given by feS^) — fi^i^i) = "dTT' ^« ^ W^, 
i = 1,...,K, a; G M . For simplicity, ji is either the Lebesgue measure or a counting measure. 
We adopt in the sequel the formulation of the multiple change point problem as it appeared in 
Srivastava and Worsley (1986) and Chen and Gupta (2000, 2004). Based on these authors, sup- 
pose that adjacent observations are grouped in q groups, so that Xi, X2, ...,Xfcj, are in the first 
group, Xfcj+i,Xfc^+2, ...,Xfc2, are in the second group and we continue in a similar manner until 
^fc,_i+i, ^fc,_i+2, •••, ^fc, = Xk are in the q-ih. group. 

Consider the model for changes in the parameters. This is formulated as a problem of testing the 
following hypotheses, 

Hq: 61 = 62 = ... = 6k{ = 6*0, ^0 unknown), (1) 

versus the alternative 

Hi: 61 = ... = 6^1 / 0fci+i = ... = 6^2 7^ ■■■ 7^ 0fe,_i+i = ••• = ^kg = ^K, 

where q, 1 < q < K, is the unknown number of changes and ki,k2, ...ykq are the unknown positions 
of the change points. The above hypotheses can be equivalently stated in the form 

Hq : Xi are described by fog, i = 1, .■.,K and 6^ unknown, (2) 

versus the alternative 

Hi : Xfc^,+i,Xfc^.+2,...,Xfc^,_^j, j = 0, ...,g-l are described by /e^.^^ , 

with Xkg = Xk- 

There is an extensive bibliography on the subject and several methods to search for the change point 
problem have appeared in the literature. Among them, the generalized likelihood ratio test, Bayesian 
solution of the problem, information criterion approaches, cumulative sum method, etc. Based on 
these methods, several papers discuss the change-point problems in specific probabilistic models, like 
the univariate and multivariate normal distribution, the gamma model and the exponential model. 
For instance. Sen and Srivastava (1980) focused on the single change-point problem. Moreover, they 
consider that within each section, the distributions are the same, while the distribution in a section 
is different from that in the preceding and the following section in mean vector or covariance matrix. 
For an exposition of these methods and their application to specific distributions we refer to the 
monograph or the survey paper by Chen and Gupta (2000, 2001) and the references appeared therein. 

It has been proposed in these and other treatments (cf., for instance, Vostrikova (1981)), that in 
order to study the multiple change point problem, which is formulated by ([1]) or ([2]), we just need to 
test the single change point hypothesis and then to repeat the procedure for each subsequence. Hence, 
we turn to the testing of ([2]) against the alternative. 

Hi: Xi = fe^, i = l,...,K and Xi = fg^, i = k + 1, ...,K, (3) 

where the symbol = is used to denote that the observations on the left follow the parametric density 
on the right. In (l3|), k represents the position a single change point, which is supposed to be unknown. 
A general description of this technique in the detection of the changes is summarized in the following 



steps by Chen and Gupta (2001). First we test for no change point versus one change point, that is, 
we test the nuh hypothesis given by ([2]) versus the alternative given by ^ and equivalently stated by 
Hi: 6i = ... = 0fj 7^ ^K+i = ••• = ^K- Here, k is the unknown location of the single change point. If 
Hq is not rejected, then the procedure is finished and there is no change point. If Hq is rejected, then 
there is a change point and we continue with the step 2. In the second step we test separately the two 
subsequences before and after the change point found in the first step for a change. In the sequel, we 
repeat these two steps until no further subsequences have change points. At the end of the procedure, 
the collection of change point locations found by the previous steps constitute the set of the change 
points. 

The subject of change point analysis is twofold. On the one hand to detect if there is one or more 
changes in a sequence of observation. The second aspect of change point analysis is the estimation of 
the number of changes and their corresponding locations. In this paper we will develop an information 
theoretic procedure which is based on divergence, in order to study the change point problem. The 
measures background is a general parametric, univariate or multivariate family of distributions. We 
describe formally the framework and the problem in Section [21 and the main results are presented in 
Section [3l In Section |4] we focus our interest on a specific distribution, the binomial distribution and 
a simulation study is performed in order to compare the accuracy the new test-statistic with some 
pre-existing test-statistics. In the final Section \E\ the general results of this paper are illustrated by 
means of the well-known Lisdisfarne scribes data set. 

2 Information theoretic procedure 

Consider now the single change point problem, that is the problem of testing the pair of hypotheses 

Ho: Xi = f0^,i = 1,...,K (4a) 

Hi: Xi = fe^^, i = I,... ,K and Xi = f0^,i = K + I,... ,K, (4b) 

which are presented by ([2]) and ([3]), respectively. In the above formulation, ^o and 6i are unknown. 
Since k is the unknown location of the single change point, we will consider all the candidate points 

k G {1,...,K — 1}. Let ^0 fc denotes the maximum likelihood estimator (MLE) of 6q which is based 

on the random sample Xi, ...,X}^ from fg^ and let Oi ^ denotes the m.l.e. of 6i which is based on 
the random sample Xk+i, ...,Xk from /e^. If the hypothesis Hi is true, then there is a difference 
between the probabilistic models /-(k) and f^^K) , which cause a large value for a measure of the 

distance between /-(a-) and /-(k) . Given that the ^-divergence is a broad family of distance measures 

between probability distributions, the (/)-divergence between f-{K) and /-(k) is large if Hi is true and 

hence it can be used in order to decide if the candidate point k in (|4bp is a change point {k = k). 

Taking into account that the m.l.e. Oq ^ and Oi ^ of ^o and 6i, respectively, depend on the candidate 
change point k, we will adopt the following notation for the (/)-divergence between /-(k) and /-(k), 



^0,fe "l,k 



^1 I- 



provided that the convex function cp satisfies some additional conditions (see page 408 in Pardo (2006)) 
which ensure the existence of the above integral. Moreover, we consider convex functions (j) which 
satisfy 0(1) = and </>"(!) 7^ 0. Large values of D\ support the existence of a change point and 

(k) (k) 

therefore large values of D\ suggest rejection of the null hypothesis Hq. Hence D\ can be used as 
a test statistic for testing the hypotheses (j4ap . Then, motivated by the fact that large values of D\ 



are in favor of Hi, a test for testing the existence of a single change point, that is the hypotheses 
should be based on the (/)-divergence test statistic. 



where 



^* -,„-s--.,n"'('). w 



Moreover, the unknown position of the change point k is estimated by k^ such that 

K0 = argmax Tf\k)= argmax — D^^Hf-^(K),f-^{K)\. (8) 

Based on the above discussion, Hq in (|4ap is rejected for T\ > c, where c is a constant to be 



argmax T\ '{k) - 
ke{i,...,K-i} 


= arg max 


ke{i,...,K-i} 



determined by the null distribution of T] . Hence, in order to use T\ of ([U]) for testing hypotheses 
a]), it is necessary the knowledge of the distribution of Ti , under Hq. 
There are two important reasons why working directly with test-statistics T, , defined in ([6|), is 



, is not an easy to handle 



avoided, on one hand, its asymptotic distribution supig(o,i) t{i-t) ^0 (*) 
random variable (see for instance Theorem 1.2 and 1.3 in Gombay and Horvath (1989)) and on the 
other hand, in practice cases such that k G {1,-?^ — 1} are very difficult to detect. Let A^(e) be the 
set all possible integers k G {1, ..., K — \} such that -j^ G [e, 1 — e], with e > 0, small enough. We shall 
modify ([6]) to be maximized in N{e), i.e. 

^Tf^= max Tf\k), (9) 

and in the same manner ([8]) becomes 

'k^ = arg maxTJ-^^ {k) = arg max D^^' ( U(k) , U(k) ) . (10) 

3 Main result 

In order to get the asymptotic distribution of the family of tests statistics T\ , given in ([6]), we 
shall assume the usual regularity assumptions for the multiparameter Central Limit Theorem (see for 
instance Theorem 5.2.2. in Sen and Singer (1993)): 

(i) The parameter space, 0, is either M™ or a rectangle in R™. 



(ii) For cdlOy^e'eec 
(iii) For = (01,..., 



f,({xGX(''^:feix)^fg>{x)}) > 0. 



m) ) 



d (9^ 

-QQ-hix) and ^^-^/^(a;), z,j G {l,...,m}. 



exist almost everywhere and are such that 



wj'^""^ 



< Hi{x) and 



92 



50^9% 



/0(^: 



< Gij{x), i,j e {l,...,m}. 



where 



/ Hi{x)dfj,{x) < 00 and / Gij{x)dn{x) < 00, i,j G {l,...,m}. 



(iv) Denoting i{x;6) = log fg{x), 

d 52 

—e{x;e) and — — ^(aj; 0), i, j G {1, ...,m}, 

exist almost everywhere and are such that the Fisher information matrix is finite and positive 
definite. In addition, lim^^o V'('^) = where 



HS) = Eg 



sup 

{h:\\h\\<S} 



d^ 



-£{x;e + h) 



d^ 



d{x-G) 



-th^^(-;^) 



<x-e] 



dOde^ ' dOde^ 

and II •II is the Euclidean norm 



i,jG{l,...,m} 



Theorem 1 Under Hq in (j^ajj and the previous regularity assumptions, (i)-(iv), the asymptotic dis- 
tribution of ^ is given by 

"-' ' ill) 



erpiK) C ^ 
^ K— >-oo 



where m = dim(0), 



Am), 



'Tm,e= sup — - 

tG[e,l-e] *l-L - t) 



wr(t) 



(12) 



with WQ'^'{t) = {(VFo,i(t), ..., Wo,m(i))'^}te[o,il) being an m-dimensional vector of independent Brow- 

2 



nian bridges and 



wi'^\t) =ET=iWUt)- 



Proof. According to the properties of the MLEs we know that 



Vk 9^ 



' / fe— 5>00 



i(^) 



{K-k)'-^oo 



VK^k(e\'ei)^ ^ M(o,iAdir') 



where for G O, such that m = dimG, Ij^{d) = f —E qq.qq_ log fe^Xi 
tion matrix. If we consider that A^ = hmx->oo -^ , then 



, is the informa- 

;,iG{l,...,m} 



k{K k) f^iKj _ ^^^ _^ _^ , 0^ (1 _ xi->)i^ieo) 






'•'^■-'>^?-eO AA-kAfweO-'V 



K \ ''^ V K^oo 

This means that under Tio, i.e. 9q = Oi 



A \ / A— >-oo 

and hence we can construct a Wald-type test-statistic as follows 

Ql-) = M^ (s^ _ §S')"i;(5^) (sr.' - «S') . (13) 

where Ij^{6q) is any consistent estimator of Ij^{Oq). From Theorem 1 in Hawkins (1987) we know 
that 

max Q\ — > Tm,e 

keN{e) K^oo 

In addition from Pardo (2006), page 443, we have 

(i^) k{K -k) 2 nW^ n\ 



where 



/g(^if) (a:) ^ 



^0,k ^l,k 



Um (x) 



With both results we conclude ([TT]) . ■ 

Remark 2 If we compare il3\) with formula (2.3) in Hawkins (1987), both apparently are not equiv- 
alent because in our case ^ j7 ' appears rather than k{K — k) of formula (2.3). This difference is 
associated with the way of understanding Fisher Information matrix, in fact our Wald test- statistic co- 
incide with the empirical stochastic process denoted by QK{t) at the beginning of Section 3 in Hawkins 
(1987). 



Remark 3 The probability distribution function of random variable Tm,e, for e > 0, given in M^) . 
can be found in Sen (1981, page 397) and De Long (1981). The computation of the probability distri- 
bution function is complex, however it is possible to approximate the p-value of the test in which the 
distribution of Tm,e is considered under the null hypothesis. In Estrella (2003), for instance, 

,^M..) - ^ (|)'e.p{-|} (>o, (ii^) (1 - ^) . f) . ,14) 



with r (t) being the Gamma function, is proposed as an approximation of 



p—value{x, e) = Pr {Tm,e > a;) = Pr sup — p 

\se(l,(l-e)2/e^) v^ 

1 f^\^ ( ^1 A f (^ ~ ^) 



W'q '{s 



-2J ^2 



'm^ ( o / exp 



{-1} '- 




When calibrating the approximation for the univariate parameter (m = 1), we can take into account 
that the exact quantiles of order (1 - a) G {0.90,0.95,0.99} for e = 0.05, are 8.31, 9.90 and 13.45 
respectively, i.e. p-value(8.31,0.05) = 0.1, p-value(9.90, 0.05) = 0.05, p-value(13.45,0.05) = 0.01. 

// we use \14^ with e = 0.05 and the aforementioned quantiles, we obtain p— value(8.31,0.05) = 9. 
778 9 X 10-2, p^^^TOlue(9.90, 0.05) = 4.886 8 x 10-^, p^^TOlue(13.45,0.05) = 9.835 8 x lO^^. We can 

see that in particular, p— value(x, 0.05) approximates very we/Zp— value(x,0.05) when x is the quantile 
of order 1 — a = 0.99, which is in practice of major interest. 

4 Simulation Study 

In this section we are going to focus on the change point analysis for a particular discrete probability 
model, the binomial model. For this special case we will give an explicit expression for divergence based 
test-statistics. The accuracy will be compared by simulation with respect to pre-existing test-statistics. 
In this context, suppose we are dealing with a sequence of independent r.v.'s Xi ~ Bm{ni,9i), i = 
1, ...,K, for which we are interested in testing ([1]). In order to do that we are going to consider a 
sequence of independent Bernoulli r.v.'s Xj/j ~ Bev{9i), i = 1,...,K, h = l,...,nj, whose probability 
mass function (p.m.f.) is given by pe,{x) = 6f{l — 9i)^~^, x G {0, 1}, and pe^{x) = 0, x ^ {0, 1}. If we 
denote the cumulative steps between consecutive Binomial r.v.'s by 

k 

4 = 1 

the change points are located at {1,2, ..., Nk — 1,Nk} for Xi^ and at {-/V^j^-^ for X^. Hence, Xih is 
the only sequence of r.v.'s which are strictly identically distributed, but the change points of interest 
are located in {-/V^j^-^ C {1,2, ..., N^ — 1,Nk}- This means that we can construct the test-statistic 
by considering a sequence of i.i.d. r.v.'s but in addition we restrict the set of possible change points 
to {-/Vfc}^!, rather than one step change points. When the change point is located at N^, the MLEs 
of ^0 and 6i are given by 

^K) _Yk_ n{K) _ Yx-Yk 

V -AT,' ^^ - NK-Nk' 



Yk = Y.^^ = Y.J2^^^- 



=1 h=l 



The likelihood ratio test-statistic is given by S^ ' = niax^gn j^i S^ , where 

1- 



S. 



(K) 



^^fclC^log 



+ {NK-Nk) \e\1ho: 




e^)log 



+ (1 



C^)log 




(15) 



Two important papers which cover S^ ' are Worsley (1983), and Horvath (1989). The expression 
they gave for Sj^ is not exactly the same, but it is equivalent to (jlSp (see formula (3.22) in Horvath 
and Serbinowska (1995)). Horvath (1989) found that the asymptotic distribution for a kind of nor- 
malization of S^^^ based on the Darling-Erdos formula 



G(^) 



2 log NkS(^) -2logNK-^ log log Nk + - log ir, 



is asymptotically equal to a Extreme Value random variable with parameters fi = log 2 and /3 = 1. 
In addition, in Theorem 1.2 of Horvath and Serbinowska (1995), a modified version of the likelihood 
ratio test-statistic was given, S^^' = max^^j-^ K}^k ' '^here 

^(K) _ Nk{NK-Nk) (K) 

The asymptotic distribution of S^ ' is the supremum in (0, 1) of a standard univariate Brownian 
bridge (its probability distribution function is tabulated in Kiefer (1959)). We consider the version of 

(K) 



the Wald test-statistic '^Q^^> = vna^yL^^^jq u\ Q, , with 



Q 



eN{e) 






^k Nk "''' 

where the consistent estimator of Ij^{Oq) is given by 



9 



i,fc 



lAo^ 



0) 






Nk 



Nk' 



Nk 



NK0iK) 



1 






+ 



NK-Nk 



Nk 



^S 



1 



oif 



Finally, in order to give an explicit expression for divergence based test-statistics we are going to 
focus on a family of divergences, power divergences (see Read and Cressie (1988)), for which (px (x) = 
j^j:^ (2;^+^ -X- X{x- 1)) , if A(l + A) / and (px {x) = lim^^^A (pe (x), if A(l + A) = 0, that is for 
each A G M we obtain a different divergence measure between the p.m.f.s pg^ and pg-^, 



Dxipeo,Pei] 



1 



9^ a-Oo)^ 
0X + n _ fl,u 



1 ,if A(l + A)/0. 



A(l + A) \^ 6^ {l-Oi 

When A = the power divergence coincides with the so called Kullback divergence 

1-^0 



Do {peo ,P0i) = -DkuII {peo ,Pei) = N'o log 



+ (1 - ^o) log 



and when A = — 1 the power divergence coincides with the modified Kullback divergence D-i{pgQ,pg-^] 
DxuiiiPei , Pdo) ■ Hence, the shape of the power-divergence based test-statistics is ^TJ^ 
maKkeN(e) T\(9^^k A^k)^ where 

TxiOlJAJ) = 2 — D, [P^.^,P^.^) , 

that is 

Ta(C^€^) = ^^(^f-^^) ,^^^^ I A£1Z_ + A__r^Z_ _ 1 I for A(l + A) / 0, (16) 



Nk A(1 + A) 




and 



'^K)\ /i_-si^y 



^o(^o,fc ' \k ) - 2 j^ I ^o,fc ^°S I ^ I + (1 - %k ) iog I ^ _ ^A') I I • ^^^) 



'l,k / \^ '^l,fc 

Assuming that there is a monotone, continuous function g such that g{0) = and 



hm max 

K^ookeN{€) 



Nk{NK-Nk) (k{K-k) 

9 



Nk " V ^ 



0, 



the asymptotic distribution of ^Q^ ' and "^T^ , for ah A E M, is the supremum in [e, 1 — e] of the 
univariate tied-down Bessel process, i.e. (I12p with m = 1. This assumption is very similar to the 
assumption given in Horvath and Serbinowska (1995) for the asymptotic distribution of S^-^'. 

A simulation study is performed in order to compare the accuracy of the proposed power divergence 
type test with respect to pre-existing test-statistics. In this context we apply test-statistics S^^' , G^^' , 
o.05rp{K)^ o.05rp{K)^ 0.05 j^{K)^ 0.05 q{K) ^j^j^ ggoQ replication. The design is essentially the same as the 
study performed in Horvath and Serbinowska (1995): ^o = 0.5; three possible values oi K and nominal 
sizes a are considered; apart from the quantiles of order 1 — a, xi_q, the exact sizes a are calculated. 
With K = oo, it is understood that xi-a is the asymptotic quantile associated to the corresponding 
test-statistic. Taking into account that the maximization for obtaining "^T^ , ''T2 , ''Q^ \ with 
e = 0.05 is over all possible integers k S N{e), we removed k G {1,...,K — 1} when k < eK or 
k> {l-e)K. 

Looking at the results given in Table [H the worst approximation of a is obtained with G^^'. 
The Wald test-statistic 0-05q{A') jg ^ good competitor for the test-statistic introduced in Horvath and 
Serbinowska (1995), S^^'. All the exact sizes underestimate the nominal size, which means that the 
best approximation is obtained with the greatest value of a, hardly ever obtained with the power- 
divergence based test-statistic with A = 2, O-OSjA J_ 



Table 1: Exact simulated sizes. 







K = 


= 64 


K = 


:300 


K = 


500 


K = oo 




1-a 


Xl-a 


a 


Xl-a 


a 


Xi^a 


a 


Xl-a 




0.90 


1.302 


0.0664 


1.386 


0.0786 


1.420 


0.0860 


1.498 


SiK) 


0.95 


1.619 


0.0318 


1.710 


0.0372 


1.740 


0.0400 


1.844 




0.99 


2.595 


0.0094 


2.484 


0.0072 


2.531 


0.0074 


2.649 




0.90 


1.707 


0.0208 


1.939 


0.0260 


2.011 


0.0288 


2.943 


G(K) 


0.95 


2.277 


0.0076 


2.431 


0.0094 


2.555 


0.0118 


3.663 




0.99 


3.394 


0.0002 


3.653 


0.0002 


3.796 


0.0000 


5.293 




0.90 


7.351 


0.0654 


7.881 


0.0834 


7.801 


0.0824 


8.31 


omrpiK) 


0.95 


8.968 


0.0340 


9.458 


0.0412 


9.514 


0.0430 


9.90 




0.99 


12.730 


0.0086 


13.143 


0.0094 


12.981 


0.0078 


13.45 




0.90 


7.374 


0.0664 


7.884 


0.0832 


7.809 


0.0828 


8.31 


omj^i^) 


0.95 


9.007 


0.0352 


9.464 


0.0412 


9.509 


0.0432 


9.90 




0.99 


12.851 


0.0088 


13.128 


0.0094 


12.981 


0.0078 


13.45 




0.90 


7.432 


0.0688 


7.911 


0.0840 


7.818 


0.0834 


8.31 


omj^iK) 


0.95 


9.141 


0.0370 


9.495 


0.0416 


9.519 


0.0434 


9.90 




0.99 


13.061 


0.0094 


13.129 


0.0094 


13.015 


0.0084 


13.45 




0.90 


7.311 


0.0642 


7.871 


0.0822 


7.800 


0.0824 


8.31 


0.05q{K) 


0.95 


8.934 


0.0334 


9.441 


0.0408 


9.508 


0.0430 


9.90 




0.99 


12.742 


0.0084 


13.135 


0.0094 


12.973 


0.0078 


13.45 
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5 Numerical Example: Lindisfarne Scribes problem 

The Lindisfarne Gospels are presumed to be the work of a monk named Eadfrith, who became Bishop 
of Lindisfarne in year 698. In the 10th century an Old English translation of the Gospels was made for 
one or more scribes. Several statisticians have been devoted to studying the problem of the number of 
scribes who participated in the translation of the Gospels. Such a problem is known as the "Lindisfarne 
Scribes problem". 

In the framework of the model that is followed in the simulation study, the Lindisfarne Gospels 
are considered to be divided into K = 64 consecutive sections (see Ross (1950) for more details). 
It is supposed that each section could have been translated by one scribe and the same scribe is 
associated only with consecutive sections. Since the present indicative in Old English verbs admitted 
several variants in its spelling, the custom of using these variants can be used as a key factor useful 
to identifying different translators. Based on the data given in Table EJ it is counted rii as the total 
of observed frequencies that the third singular or second plural appears in each section i = 1,...,64, 
and the observation Xi (coming from r.v. Xi) represents how many times ending —s appear in these 
verbs. Note that either the third singular or second plural admit two endings, —s and —5, and hence 
if we want to know how many times ending —S appear in these verbs, the observations are obtained 
as rii — Xi, i = 1, ...,K. It is assumed that the custom of using both endings for each scribe is different 
and for this reason our interest is to find the consecutive changes in the probability structure of both 
endings. 
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Since the proposed test-statistics are valid for single change-point detection, now we are going to 
describe the algorithm based on the binary segmentation procedure. In order to make a sequence of 
hypothesis testing, it is convenient to use a = 0.01 if we want to get a not very large upper bound for the 
global significance level according to the Bonferroni's inequality. Suppose that the power-divergence 



(K) 



based test-statistics with A = 2, e = 0.05, "■^^T2 , is our focus of interest. The algorithm based 
on the binary segmentation procedure (Vostrikova (1981)) is described in Figures [l]l2j We consider 
A^(e) = {3, ..., 61} as change point candidates in Step 1, i.e. we have initially taken {1, ..., K — 1} but 
we have removed all candidates k such that k < Ke or /c > Ke. Once the values of T2{0q ^ , 9^ ^ ) are 
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obtained for each candidate belonging to k G N{e), we select its maximum argument, k = 31, which is 
accepted as change-point because the p- value is less than 0.1. The p- values are calculated by following 
(I14p . From now we have to investigate how to divide [1,31] into segments (Step 2) and [32,64]. We 
will continue until all candidates have p- values greater than 0.1. After 12 steps it is concluded that 
the Lindisfarne Gospels could have been written by seven scribes because the obtained segments are 
[1,10], [11,18], [19,23], {24}, [25,31], [32,52], [53,64]. This conclusion differs a little bit from the 
conclusion obtained in Horvath and Serbinowska (1995), because the number of scribes they proposed 
was one less and the locations of the change points are not exactly the same. 
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STEP 5: p-value>0.1 
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STEP 4: p-value=0.058 
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Figure 1: Binary segmentation procedure for the Lindisfarne's problem (part I) 
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STEP11:p-value>0.1 
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Figure 2: Binary segmentation procedure for the Lindisfarne's problem (part II) 
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